Interactivity in Visual Analytics
In this take-home exercise 3, we are going to reveal the economic of the city of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods. We will show how the financial health of the residents changes over the period of time and how the wages are compared against overall cost of living.
The data is processed by using appropriate tidyverse family of packages, whereas the statistical graphics are prepared using ggplot2 and its extensions.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The packages required for this exercise are tidyverse, ggiraph, plotly, gganimate, zoo, trelliscopejs, gifski and gapminder.
packages = c('tidyverse','ggiraph','plotly','gganimate','zoo','trelliscopejs','gifski','gapminder')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The original dataset was obtained from VAST Challenge 2022 in csv format. It shows the financial report of 1011 residents of Engagement, OH that have agreed to participate in this study from March 2022 until May 2023.
The code chunk below imports a dataset called
FinancialJournal.csv into R by using read_csv() of
readr and saves it as tibble data frame called
financial. It consists of 1,856,330 records as shown below.
financial <- read_csv("data/FinancialJournal.csv")
head(financial)
# A tibble: 6 x 4
participantId timestamp category amount
<dbl> <dttm> <chr> <dbl>
1 0 2022-03-01 00:00:00 Wage 2473.
2 0 2022-03-01 00:00:00 Shelter -555.
3 0 2022-03-01 00:00:00 Education -38.0
4 1 2022-03-01 00:00:00 Wage 2047.
5 1 2022-03-01 00:00:00 Shelter -555.
6 1 2022-03-01 00:00:00 Education -38.0
As we are not interested in time, we will only pick up the month and
year by using as.yearmon() of zoo and
rename the new column as yearmonth.
The financial journal consists of 6 categories namely Education, Food, Recreation, Rent Adjustment, Shelter and Wage. We are going to sum them up and group them by participant ID, yearmonth and category.
financial <- financial %>%
mutate(yearmonth = as.yearmon(timestamp)) %>%
group_by(participantId, yearmonth, category) %>%
summarise(Total = sum(amount))
head(financial)
# A tibble: 6 x 4
# Groups: participantId, yearmonth [2]
participantId yearmonth category Total
<dbl> <yearmon> <chr> <dbl>
1 0 Mar 2022 Education -76.0
2 0 Mar 2022 Food -268.
3 0 Mar 2022 Recreation -349.
4 0 Mar 2022 Shelter -1110.
5 0 Mar 2022 Wage 11932.
6 0 Apr 2022 Education -38.0
In this code chunk below, we are going to create a new table called expenses to reveal the living cost of the residents by excluding Wage and Rent Adjustment from category column.
Next, we are going to use pivot_wider() of
dplyr to create new columns based on the amount of each
category, and rename the new table as fin_wide.
We will also strip all NA values by specifying
na.rm = TRUE and calculate the living cost, revenue and
savings.
Living cost = Education + Food + Recreation + Shelter
Revenue = Wage + Rent Adjustment
Savings = Revenue - Living cost
fin_wide <- financial %>%
pivot_wider(names_from = "category",
values_from = "Total") %>%
rowwise() %>%
mutate(LivingCost = sum(c_across(c(1:4)),na.rm=T)) %>%
mutate(Revenue = sum(c_across(c(5:6)),na.rm=T)) %>%
mutate(Savings = sum(c_across(c(7:8)),na.rm=T))
head(fin_wide)
# A tibble: 6 x 11
# Rowwise: participantId, yearmonth
participantId yearmonth Education Food Recreation Shelter Wage
<dbl> <yearmon> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 Mar 2022 -76.0 -268. -349. -1110. 11932.
2 0 Apr 2022 -38.0 -266. -219. -555. 8637.
3 0 May 2022 -38.0 -265. -383. -555. 9048.
4 0 Jun 2022 -38.0 -257. -466. -555. 9048.
5 0 Jul 2022 -38.0 -270. -1070. -555. 8637.
6 0 Aug 2022 -38.0 -262. -314. -555. 9459.
# ... with 4 more variables: RentAdjustment <dbl>, LivingCost <dbl>,
# Revenue <dbl>, Savings <dbl>
Trellis display below shows how monthly savings of each participant change from March 2022 to May 2023.
qplot(yearmonth, Savings, data = fin_wide) +
labs(x = "Year & Month",
y = "Savings") +
theme_bw() +
facet_trelliscope(~ participantId,
nrow = 2, ncol = 5,
width = 450,
path = "trellis/",
self_contained = TRUE)